[Feat(tests)] multi-worker freestyle mock by nams1570 · Pull Request #1430 · hexclave/stack-auth

nams1570 · 2026-05-13T21:47:16Z

Context

Lots of flakiness comes from email polling leading to timeouts. This usually happens when freestyle mock cannot service requests in time. Old mock was single threaded and so clogged up by a lot of requests.

Summary of Changes

A multiworker system should be better.

Summary by CodeRabbit

Tests
- Replaced fixed delays with polling-based verification for failed-emails digest tests, improving reliability and ensuring assertions wait for expected email batches.
- Updated password-update E2E tests to create users without waiting for email delivery, keeping behavior and assertions unchanged.
Chores
- Reworked mock execution service to use a pooled worker model with per-job timeouts, job recycling, bounded dependency installs, improved logging, and robust per-request cleanup.

vercel · 2026-05-13T21:47:20Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
stack-auth-hosted-components	Ready	Preview, Comment	May 13, 2026 11:27pm
stack-auth-mcp	Ready	Preview, Comment	May 13, 2026 11:27pm
stack-backend	Ready	Preview, Comment	May 13, 2026 11:27pm
stack-dashboard	Ready	Preview, Comment	May 13, 2026 11:27pm
stack-demo	Ready	Preview, Comment	May 13, 2026 11:27pm
stack-docs	Ready	Preview, Comment	May 13, 2026 11:27pm
stack-preview-backend	Ready	Preview, Comment	May 13, 2026 11:27pm
stack-preview-dashboard	Ready	Preview, Comment	May 13, 2026 11:27pm

coderabbitai · 2026-05-13T21:47:31Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 49158f09-ec75-4621-8e21-f580ed4f4047

📥 Commits

Reviewing files that changed from the base of the PR and between 30853e8 and f0fbf74.

📒 Files selected for processing (1)

apps/e2e/tests/backend/endpoints/api/v1/auth/password/update.test.ts

📝 Walkthrough

Walkthrough

Tests for failed-emails digest now poll the dry-run endpoint and filter mailbox-specific batches; the freestyle-mock Dockerfile adds a worker thread script and a queued WorkerPool with npm-install orchestration and safer HTTP request handling.

Changes

Test infrastructure and worker pool architecture

Layer / File(s)	Summary
Digest polling helpers and test updates `apps/e2e/tests/backend/endpoints/api/v1/internal/failed-emails-digest.test.ts`	Adds types and dry-run-only helpers that poll the digest endpoint, select mailbox-owner failed-email batches, and wait for expected counts; updates test helper signature and tests to use polling.
Password-update signup parameter tweaks `apps/e2e/tests/backend/endpoints/api/v1/auth/password/update.test.ts`	Updates multiple password-update E2E tests to create users with `noWaitForEmail: true` during signup.
Worker thread implementation `docker/dependencies/freestyle-mock/Dockerfile`	Adds `worker.mjs` which runs user scripts inside `worker_threads`, captures console logs, applies env overrides, executes the script module, and posts structured `{status, result
Server header: worker integration `docker/dependencies/freestyle-mock/Dockerfile`	Updates `server.mjs` header to import `worker_threads` and compute the `workerScript` path.
Pool sizing and timeouts `docker/dependencies/freestyle-mock/Dockerfile`	Introduces POOL_SIZE, MAX_JOBS_PER_WORKER, JOB_TIMEOUT_MS, and NPM_INSTALL_TIMEOUT_MS constants used by the pool and server.
WorkerPool, queuing, and lifecycle `docker/dependencies/freestyle-mock/Dockerfile`	Implements a queued WorkerPool with per-slot busy/draining state, job dispatch to idle slots, worker recycling after a job-count threshold, and JOB_TIMEOUT_MS enforcement.
npm install orchestration `docker/dependencies/freestyle-mock/Dockerfile`	Adds `runNpmInstall(workDir)` with an NPM_INSTALL_TIMEOUT_MS, child-process spawn error handling, and SIGKILL on timeout.
HTTP server handler and startup `docker/dependencies/freestyle-mock/Dockerfile`	Rewrites request handling to create per-request workdirs, write `script.mjs` and generated `package.json`, conditionally run `npm install` based on a preinstalled-version map, execute via the WorkerPool, serialize responses before `writeHead`, ensure cleanup, and log computed pool size on startup.

Sequence Diagram

sequenceDiagram
  participant HTTPServer
  participant WorkerPool
  participant Worker
  HTTPServer->>WorkerPool: run(jobConfig)
  WorkerPool->>WorkerPool: select idle slot or enqueue
  WorkerPool->>Worker: dispatch(job)
  Worker->>Worker: apply envVars and capture console and execute script
  Worker->>WorkerPool: postMessage {status,result or error,logs}
  WorkerPool->>HTTPServer: resolve or reject with {result|error,logs}

🎯 4 (Complex) | ⏱️ ~45 minutes

"I polled and waited, patient paws in tune,
Workers hum softly under the moon,
Logs gathered like clover in a stack,
Jobs queue and finish, then head back,
A rabbit hops — the tests pass soon."

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title directly reflects the main change: implementing a multi-worker system for the freestyle mock to improve test reliability.
Description check	✅ Passed	The description provides context about test flakiness and explains the solution, though it lacks specific implementation details about the worker pool system.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch refactor-flaky-email-tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Copilot

Pull request overview

This PR aims to reduce e2e flakiness caused by timeouts during email polling by increasing the throughput of the freestyle-mock execution service, and by making the failed-emails digest test poll deterministically instead of using a fixed sleep.

Changes:

Replace the inline single-threaded freestyle-mock execution model with a worker_threads pool to run multiple scripts in parallel.
Add a polling helper to failed-emails-digest e2e tests to wait until expected failures are recorded, removing a fixed sleep.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File	Description
docker/dependencies/freestyle-mock/Dockerfile	Adds `worker.mjs` + a worker pool in `server.mjs` to parallelize script execution and recycle workers periodically.
apps/e2e/tests/backend/endpoints/api/v1/internal/failed-emails-digest.test.ts	Adds a digest polling helper and updates the dry-run test to wait for expected failures instead of sleeping.

Comments suppressed due to low confidence (1)

apps/e2e/tests/backend/endpoints/api/v1/internal/failed-emails-digest.test.ts:167

Indentation in this block is inconsistent (e.g., expect(response.status)... and the following if/else are indented deeper than the surrounding statements). Since apps/e2e runs ESLint with the indent: ["error", 2, ...] rule, this will likely fail pnpm lint for the e2e package—please reformat this section to match the established 2-space indentation levels.

    const { response, batches: mockProjectFailedEmails } = await waitForFailedEmailsDigest(2);
      expect(response.status).toBe(200);

      if (process.env.STACK_TEST_SOURCE_OF_TRUTH === "true") {
      expect(mockProjectFailedEmails).toMatchInlineSnapshot(`[]`);
      } else {
      expect(mockProjectFailedEmails).toMatchInlineSnapshot(`

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

greptile-apps · 2026-05-13T21:50:59Z

Greptile Summary

This PR tackles test flakiness caused by the freestyle mock being unable to service concurrent email-rendering requests. The single-threaded server is replaced with a WorkerPool backed by one worker_threads worker per CPU core, each handling at most 50 jobs before recycling, with per-job (30 s) and npm install (60 s) timeouts. On the test side, the brittle fixed wait(10_000) is replaced with a polling helper that throws a descriptive timeout error instead of silently returning partial state.

Dockerfile: New worker.mjs thread handles script execution with full process.env / console isolation per job; server.mjs manages the pool, job dispatch, worker recycling, and guaranteed HTTP responses on all error paths.
Test file: waitForFailedEmailsDigest polls the digest endpoint until the expected email count appears, and testFailedEmails no longer accepts an isDryRun argument (always dry-run); the non-dry-run test is left as .todo with an updated comment.

Confidence Score: 5/5

Safe to merge — the worker pool replaces a well-understood single-threaded bottleneck, all error paths send an HTTP response, and previously flagged concerns are now addressed.

The Dockerfile changes are a self-contained testing mock with no production data paths; the multi-worker logic is well-commented and correctly handles worker recycling, timeouts, and error/exit races. The test change removes a fragile fixed delay in favour of a polling helper that throws a clear diagnostic on timeout.

No files require special attention beyond the minor polling-condition note in the test helper.

Important Files Changed

Filename	Overview
apps/e2e/tests/backend/endpoints/api/v1/internal/failed-emails-digest.test.ts	Replaced fixed 10 s wait() with a polling helper that throws a clear timeout error; isDryRun parameter removed in favour of always using dry_run=true. Minor: polling exit condition checks only batches[0], which could prematurely exit when multiple projects share the same owner mailbox.
docker/dependencies/freestyle-mock/Dockerfile	Extracted script execution into a worker.mjs thread; server.mjs now manages a WorkerPool with per-job timeouts (30 s), per-install timeouts (60 s), worker recycling after 50 jobs, and stdio ignore on npm install to prevent pipe-buffer deadlock. Addresses previously flagged concerns.

Sequence Diagram

sequenceDiagram
    participant Client as HTTP Client
    participant Server as server.mjs
    participant Pool as WorkerPool
    participant Worker as worker.mjs (thread)
    participant FS as Filesystem

    Client->>Server: POST /execute/v2/script
    Server->>FS: mkdir(workDir), writeFile(script.mjs), writeFile(package.json)
    alt needs extra modules
        Server->>Server: runNpmInstall(workDir) [60s timeout]
    end
    Server->>Pool: "pool.run({ scriptFile, envVars }) [30s timeout]"
    alt idle slot available
        Pool->>Worker: "postMessage({ scriptFile, envVars })"
        Worker->>FS: import file://scriptFile
        Worker->>Worker: execute, capture logs
        Worker->>Pool: "postMessage({ status, result/error, logs })"
    else all slots busy
        Pool->>Pool: enqueue job, wait for slot
    end
    Pool->>Server: resolve/reject
    Server->>Client: 200/500 JSON response
    Server->>FS: rm(workDir) [finally]

    note over Pool,Worker: After MAX_JOBS_PER_WORKER=50 jobs, slot.draining=true, terminate, exit, spawnSlot()

Prompt To Fix All With AI

Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 2
apps/e2e/tests/backend/endpoints/api/v1/internal/failed-emails-digest.test.ts:53
The polling exit condition only inspects `batches[0]?.emails.length`, but `selectBatchesForCurrentMailbox` can return multiple batches when the same owner mailbox has failed emails across more than one project (e.g. across the 10 repeats of the outer test). In that case the loop exits as soon as *any* first batch reaches the threshold — which may not be the current test's project — and the returned `batches` array then contains extra entries, causing the inline snapshot assertion to fail with a confusing multi-batch diff instead of a clear "wrong project" error. Scoping the check to the total across all current-mailbox batches would avoid this.

```suggestion
  while (batches.reduce((sum, b) => sum + b.emails.length, 0) < expectedFailedEmailCount) {
```

### Issue 2 of 2
docker/dependencies/freestyle-mock/Dockerfile:57
**Stale console patch leaks between jobs on uncaught async rejection**

The `finally` block restores `console[method]` only when the `try` block resolves or throws synchronously. If user code spawns a detached `Promise` that rejects *after* `parentPort.postMessage(response)` fires (i.e. after the `finally` block has already run and the next job has started patching `console`), that rejection will execute inside the next job's patched console — so the next job's `logs` array picks up log lines that belong to the previous job. This is unlikely in practice for email-rendering scripts, but the issue exists structurally because the `parentPort.on("message", async …)` handler is one long async function and a stale floating promise can outlive the `finally`.

_{Reviews (2): Last reviewed commit: "feat(tests): multi-worker freestyle mock" | Re-trigger Greptile}

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (2)

apps/e2e/tests/backend/endpoints/api/v1/internal/failed-emails-digest.test.ts (1)
38-61: 💤 Low value

Consider clarifying the single-batch assumption.

Line 53 checks only batches[0]?.emails.length, which assumes at most one batch per mailbox will be returned. While this appears correct given the test expectations (lines 167-188 show a single batch), the function comment could explicitly state this assumption for future maintainers.
📝 Optional comment enhancement
 // Polls the failed-emails-digest until the current mailbox's batch contains
 // at least `expectedFailedEmailCount` emails, or `timeoutMs` elapses. Replaces
 // a fixed sleep that flaked when the SMTP failure pipeline (DNS lookup +
 // connection attempt + outbox status update) hadn't recorded all failures yet.
+// Note: Checks only the first batch, assuming one batch per mailbox owner.
 async function waitForFailedEmailsDigest(
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@apps/e2e/tests/backend/endpoints/api/v1/internal/failed-emails-digest.test.ts`
around lines 38 - 61, Update the comment above waitForFailedEmailsDigest to
explicitly state the single-batch assumption: note that the loop checks
batches[0]?.emails.length and therefore expects at most one batch per mailbox
(as used by selectBatchesForCurrentMailbox and the tests). Mention that if
multiple batches could be returned in the future the logic should be revised
(e.g., aggregate across batches or handle multiple indices) so maintainers know
why batches[0] is used.
docker/dependencies/freestyle-mock/Dockerfile (1)
38-43: ⚡ Quick win

Log capture stringifies objects to "[object Object]".

args.map(String).join(" ") collapses any non-primitive argument (objects, errors, arrays) to [object Object] / Error: ... with no fields, which makes the captured logs substantially less useful when tests assert on script output. Node's default console uses util.format, which preserves this detail and matches behavior of the un-patched console.
♻️ Proposed fix
 import { parentPort } from "worker_threads";
+import { format } from "util";
@@
   logMethods.forEach((method) => {
     console[method] = (...args) => {
-      logs.push({ message: args.map(String).join(" "), type: method });
+      logs.push({ message: format(...args), type: method });
       originalConsole[method](...args);
     };
   });
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docker/dependencies/freestyle-mock/Dockerfile` around lines 38 - 43, The
captured logs currently stringify arguments with args.map(String).join(" "),
losing object/ Error detail; change the console patch inside the
logMethods.forEach to use Node's util.format so captured messages match the
unpatched console: require or import util.format (e.g. const { format } =
require('util') or import { format } from 'util') and replace
args.map(String).join(" ") with format(...args) when calling logs.push({
message: ..., type: method }), keeping the originalConsole[method](...args) call
intact.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docker/dependencies/freestyle-mock/Dockerfile`:
- Around line 136-181: The `failInFlight` path clears slot.busy but doesn't mark
the slot draining, so a concurrent dispatch can claim a worker that's about to
exit; update the code so that `slot.draining = true` is set when a worker fails
(either inside `failInFlight` or immediately in the `worker.on("error", ...)`
handler before calling `failInFlight`), ensuring `dispatchNext()` will skip the
slot until the `exit` handler replaces it; refer to the `failInFlight` function,
the `worker.on("error", (err) => { ... })` handler, and the `slot.draining` flag
when making the change.
- Around line 258-261: The call to pool.run({ scriptFile, envVars }) can hang
forever if the user script never posts a message, so wrap/race the pool.run
promise with a timeout and on timeout forcefully terminate the associated worker
so the worker 'exit' handler can respawn it and clear slot.busy; specifically,
implement a timedRace around pool.run that rejects after a configurable TIMEOUT,
and on that rejection call the worker termination method used by the pool (so
the exit/cleanup logic that checks MAX_JOBS_PER_WORKER and clears slot.busy
runs), ensuring pool.run's rejection path also marks the slot free and surfaces
a clear timeout error to the caller.
- Around line 244-256: The npm install spawn block can hang because it lacks an
'error' listener, doesn't consume stdio pipes, and has no timeout; update the
logic around spawn("npm", ["install"], { cwd: workDir, stdio: "pipe" }) (used
when needsInstall && Object.keys(requestedNodeModules).length) to add
installProcess.on("error", ...) to reject immediately on spawn failures, pipe or
forward installProcess.stdout and installProcess.stderr to a logger or to
process.stdout/stderr to avoid filling OS buffers, implement a configurable
timeout timer that kills installProcess and rejects if install takes too long,
and ensure you clean up listeners and timer in the installProcess.on("close",
...) handler so the promise always resolves or rejects and doesn't leak
resources.

---

Nitpick comments:
In
`@apps/e2e/tests/backend/endpoints/api/v1/internal/failed-emails-digest.test.ts`:
- Around line 38-61: Update the comment above waitForFailedEmailsDigest to
explicitly state the single-batch assumption: note that the loop checks
batches[0]?.emails.length and therefore expects at most one batch per mailbox
(as used by selectBatchesForCurrentMailbox and the tests). Mention that if
multiple batches could be returned in the future the logic should be revised
(e.g., aggregate across batches or handle multiple indices) so maintainers know
why batches[0] is used.

In `@docker/dependencies/freestyle-mock/Dockerfile`:
- Around line 38-43: The captured logs currently stringify arguments with
args.map(String).join(" "), losing object/ Error detail; change the console
patch inside the logMethods.forEach to use Node's util.format so captured
messages match the unpatched console: require or import util.format (e.g. const
{ format } = require('util') or import { format } from 'util') and replace
args.map(String).join(" ") with format(...args) when calling logs.push({
message: ..., type: method }), keeping the originalConsole[method](...args) call
intact.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d6ad83eb-3f40-47f5-97dc-08abff6e76de

📥 Commits

Reviewing files that changed from the base of the PR and between 748d708 and 0187a5d.

📒 Files selected for processing (2)

apps/e2e/tests/backend/endpoints/api/v1/internal/failed-emails-digest.test.ts
docker/dependencies/freestyle-mock/Dockerfile

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

docker/dependencies/freestyle-mock/Dockerfile (1)

45-66: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Restore the full worker environment after each job.

This only snapshots keys from envVars. If a script mutates or deletes any other process.env entry, that state leaks into later requests handled by the same recycled worker, which can reintroduce cross-test flakiness.

🛡️ Proposed fix

-  const previousEnv = new Map();
-  Object.entries(envVars || {}).forEach(([key, value]) => {
-    previousEnv.set(key, process.env[key]);
+  const previousEnv = { ...process.env };
+  Object.entries(envVars || {}).forEach(([key, value]) => {
     process.env[key] = value;
   });
@@
-    previousEnv.forEach((value, key) => {
-      if (value === undefined) {
-        delete process.env[key];
-      } else {
-        process.env[key] = value;
-      }
-    });
+    Object.keys(process.env).forEach((key) => {
+      if (!(key in previousEnv)) delete process.env[key];
+    });
+    Object.entries(previousEnv).forEach(([key, value]) => {
+      process.env[key] = value;
+    });

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docker/dependencies/freestyle-mock/Dockerfile` around lines 45 - 66, The
current snapshot logic only saves keys listed in envVars (previousEnv) so any
mutations/deletions to other process.env entries by the user code leak across
recycled workers; instead, capture a full copy of process.env before
importing/executing the user script (e.g., create a complete snapshot object of
process.env before the userModule import/execute around scriptFile/userFunction)
and in the finally block fully restore the environment by removing keys added
during execution and restoring original values from that full snapshot (clear
current process.env then reassign original keys), ensuring the worker's
environment is identical after each job.

🧹 Nitpick comments (1)

docker/dependencies/freestyle-mock/Dockerfile (1)
87-103: ⚡ Quick win

Use container-aware parallelism for POOL_SIZE.

cpus().length returns the host system's total logical cores regardless of cgroup limits, which can cause worker pool oversubscription in CI containers. Node.js 22 provides availableParallelism() for this—it respects Docker --cpus, Kubernetes cpu limits, and cgroup constraints (both v1 and v2).
♻️ Proposed fix
-import { cpus } from "os";
+import { availableParallelism } from "os";
@@
-const POOL_SIZE = Math.max(2, cpus().length);
+const POOL_SIZE = Math.max(2, availableParallelism());
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docker/dependencies/freestyle-mock/Dockerfile` around lines 87 - 103, The
POOL_SIZE calculation uses cpus().length which ignores cgroup limits; change it
to prefer os.availableParallelism() when present and fall back to cpus().length,
then clamp with Math.max(2, ...). Update the code that defines POOL_SIZE (the
POOL_SIZE constant) to call availableParallelism() safely (e.g., typeof
availableParallelism === "function" ? availableParallelism() : cpus().length)
and keep the existing Math.max(2, ...) clamp so the pool remains at least 2;
reference the os import and the POOL_SIZE constant to locate where to modify.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@docker/dependencies/freestyle-mock/Dockerfile`:
- Around line 45-66: The current snapshot logic only saves keys listed in
envVars (previousEnv) so any mutations/deletions to other process.env entries by
the user code leak across recycled workers; instead, capture a full copy of
process.env before importing/executing the user script (e.g., create a complete
snapshot object of process.env before the userModule import/execute around
scriptFile/userFunction) and in the finally block fully restore the environment
by removing keys added during execution and restoring original values from that
full snapshot (clear current process.env then reassign original keys), ensuring
the worker's environment is identical after each job.

---

Nitpick comments:
In `@docker/dependencies/freestyle-mock/Dockerfile`:
- Around line 87-103: The POOL_SIZE calculation uses cpus().length which ignores
cgroup limits; change it to prefer os.availableParallelism() when present and
fall back to cpus().length, then clamp with Math.max(2, ...). Update the code
that defines POOL_SIZE (the POOL_SIZE constant) to call availableParallelism()
safely (e.g., typeof availableParallelism === "function" ?
availableParallelism() : cpus().length) and keep the existing Math.max(2, ...)
clamp so the pool remains at least 2; reference the os import and the POOL_SIZE
constant to locate where to modify.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3d7b77fb-7493-469d-b369-d95928ec4329

📥 Commits

Reviewing files that changed from the base of the PR and between 0187a5d and 39db261.

📒 Files selected for processing (2)

apps/e2e/tests/backend/endpoints/api/v1/internal/failed-emails-digest.test.ts
docker/dependencies/freestyle-mock/Dockerfile

🚧 Files skipped from review as they are similar to previous changes (1)

apps/e2e/tests/backend/endpoints/api/v1/internal/failed-emails-digest.test.ts

Lots of flakiness comes from email polling leading to timeouts. This usually happens when freestyle mock cannot service requests in time. Old mock was single threaded and so clogged up by a lot of requests. A multiworker system should be better.

nams1570 · 2026-05-13T22:35:36Z

@greptileai could you please rereview this PR? Do an exhaustive review pass over the changes.

signing up with an email is well tested elsewhere. Having the assertion here doesn't actually give us any extra coverage, but it does increase the load on the mock

…1432) ## Summary The multi-worker freestyle-mock rewrite ([#1430](#1430)) hardcoded `server.listen(8080)`, which collides with qstash inside the local-emulator container. Supervisord sets `PORT=8180` for freestyle-mock specifically to avoid this clash, but the new source ignores `process.env.PORT`. The local-emulator Dockerfile previously bridged this with a `server.replace('server.listen(8080)', ...)` string-patch on the embedded source. The new code is `server.listen(8080, () => { ... })` — the literal `'server.listen(8080)'` substring no longer matches, so the replace silently no-ops and freestyle-mock binds 8080. qstash then can't start (`address already in use: 127.0.0.1:8080` → FATAL), the backend (which depends on qstash) never comes up, and the emulator smoke test times out. Observed in [this run](https://github.com/hexclave/stack-auth/actions/runs/25832479377): ``` smoke-test: FTL address already in use: 127.0.0.1:8080 smoke-test: WARN exited: qstash (exit status 1; not expected) smoke-test: INFO gave up: qstash entered FATAL state, too many start retries too quickly [603s] SMOKE TEST FAILED: backend /health?db=1 did not return 200 within 300s ``` ## Changes - `docker/dependencies/freestyle-mock/Dockerfile`: `server.listen(PORT)` where `PORT = process.env.PORT || 8080`, plus the startup log reflects the actual port. - `docker/local-emulator/Dockerfile`: drop the now-redundant string-replace for the listen call. The two remaining replaces (`fs/promises` import + node_modules symlink) are unrelated and kept. ## Test plan - [ ] QEMU emulator build workflow passes on this branch (smoke test reaches healthy backend). - [ ] Verify locally that supervisord's `PORT=8180` is honored by freestyle-mock and qstash binds 8080 cleanly.  ## Summary by CodeRabbit * **Chores** * Server listening port is now configurable via PORT (default 8080). * Local emulator startup adjusted to better handle dependencies and create a node_modules symlink for smoother local runs. * Seed/process transaction timeout increased to 90s for reliability. * Local database statement timeout changed to 0 (no statement timeout). * **CI** * Added step to enable and validate KVM access during emulator builds.  [![Review Change Stack](https://storage.googleapis.com/coderabbit_public_assets/review-stack-in-coderabbit-ui.svg)](https://app.coderabbit.ai/change-stack/hexclave/stack-auth/pull/1432)

Copilot AI review requested due to automatic review settings May 13, 2026 21:47

vercel Bot deployed to Preview – stack-auth-mcp May 13, 2026 21:47 View deployment

Copilot started reviewing on behalf of nams1570 May 13, 2026 21:47 View session

nams1570 changed the title ~~feat(tests): multi-worker freestyle mock~~ [Feat(tests)] multi-worker freestyle mock May 13, 2026

github-actions Bot assigned nams1570 May 13, 2026

vercel Bot deployed to Preview – stack-preview-dashboard May 13, 2026 21:48 View deployment

vercel Bot deployed to Preview – stack-preview-backend May 13, 2026 21:49 View deployment

Copilot AI reviewed May 13, 2026

View reviewed changes

Comment thread docker/dependencies/freestyle-mock/Dockerfile

greptile-apps Bot reviewed May 13, 2026

View reviewed changes

Comment thread apps/e2e/tests/backend/endpoints/api/v1/internal/failed-emails-digest.test.ts Outdated

Comment thread docker/dependencies/freestyle-mock/Dockerfile

coderabbitai Bot reviewed May 13, 2026

View reviewed changes

Comment thread docker/dependencies/freestyle-mock/Dockerfile

Comment thread docker/dependencies/freestyle-mock/Dockerfile

Comment thread docker/dependencies/freestyle-mock/Dockerfile

vercel Bot reviewed May 13, 2026

View reviewed changes

Comment thread docker/dependencies/freestyle-mock/Dockerfile Outdated

nams1570 force-pushed the refactor-flaky-email-tests branch from 0187a5d to 39db261 Compare May 13, 2026 22:07

vercel Bot deployed to Preview – stack-auth-mcp May 13, 2026 22:08 View deployment

vercel Bot deployed to Preview – stack-preview-dashboard May 13, 2026 22:10 View deployment

vercel Bot deployed to Preview – stack-preview-backend May 13, 2026 22:11 View deployment

coderabbitai Bot reviewed May 13, 2026

View reviewed changes

feat(tests): multi-worker freestyle mock

30853e8

Lots of flakiness comes from email polling leading to timeouts. This usually happens when freestyle mock cannot service requests in time. Old mock was single threaded and so clogged up by a lot of requests. A multiworker system should be better.

nams1570 force-pushed the refactor-flaky-email-tests branch from 39db261 to 30853e8 Compare May 13, 2026 22:34

vercel Bot deployed to Preview – stack-auth-mcp May 13, 2026 22:35 View deployment

vercel Bot deployed to Preview – stack-preview-dashboard May 13, 2026 22:37 View deployment

vercel Bot deployed to Preview – stack-preview-backend May 13, 2026 22:38 View deployment

nams1570 requested a review from BilalG1 May 13, 2026 23:19

BilalG1 approved these changes May 13, 2026

View reviewed changes

chore: dont make password tests wait for verification email

f0fbf74

signing up with an email is well tested elsewhere. Having the assertion here doesn't actually give us any extra coverage, but it does increase the load on the mock

vercel Bot deployed to Preview – stack-auth-mcp May 13, 2026 23:22 View deployment

vercel Bot deployed to Preview – stack-preview-dashboard May 13, 2026 23:24 View deployment

vercel Bot deployed to Preview – stack-preview-backend May 13, 2026 23:25 View deployment

nams1570 merged commit c0871e6 into dev May 13, 2026
33 checks passed

nams1570 deleted the refactor-flaky-email-tests branch May 13, 2026 23:32

BilalG1 mentioned this pull request May 14, 2026

[Fix] freestyle-mock honors $PORT, drop server.listen string-patch #1432

Merged

2 tasks

Conversation

nams1570 commented May 13, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Context

Summary of Changes

Summary by CodeRabbit

Uh oh!

vercel Bot commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

❌ Failed checks (1 warning)

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

greptile-apps Bot commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

nams1570 commented May 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

nams1570 commented May 13, 2026 •

edited by coderabbitai Bot

Loading

vercel Bot commented May 13, 2026 •

edited

Loading

coderabbitai Bot commented May 13, 2026 •

edited

Loading

greptile-apps Bot commented May 13, 2026 •

edited

Loading